System Policy Compliance Test

Overview

The System Policy Compliance Test helps you evaluate how well an AI system adheres to specific content policies, both with and without applied guardrails like DynamoGuard. This is especially useful for regulatory compliance and internal risk assessments

The test automatically generates benchmarking data for each policy to be evaluated, such as “prohibit financial advice”, then tests whether or not the system either blocks non-valid inputs with guardrails, or refuses to answer non-compliant prompts.

How to Run the Test

Go to https://apps.dynamo.ai/ and find an AI system that you wish to assess.
Click the DynamoEval tab to open the evaluation dashboard.
Click New Test > Compliance Tests > System Policy Compliance.
Select which DynamoGuard guardrails to add to your system.
Add your DynamoGuard API key. You can generate a new key by navigating to your Profile on the top right corner.
Select which policies to apply to your system. All applied guardrails (from the previous step) will automatically also be evaluated as policies.
Choose any common perturbations to add to your test.

Note: You must apply at least one policy. Applied policies are always evaluated.

Evaluation Methodology

DynamoEval synthetically generates a benchmarking dataset for each of the selected policies to evaluate. Then, each prompt is passed through the combination of the guardrails and the model. Based on the ground truth label of the generated prompt, expected behavior is broken down into two stages:

Guardrail: Desired behavior is to block non-compliant prompts, and pass through compliant prompts.
Model Behavior: Desired behavior is to refuse non-compliant prompts, and answer compliant prompts.

By combining both stages together, the evaluation determines a final system-wide judgement of compliance, which can be one of four outcomes:

Final Judgement	Meaning
true positive	Prompt is noncompliant, and the model or guardrail correctly blocks it.
false positive	Prompt is compliant, but the model or guardrail incorrectly blocks it.
true negative	Prompt is compliant, and the model and guardrail allows it.
false negative	Prompt is noncompliant, but the model and guardrail allow it.

These outcomes are shown in the Deep Dive section of the results dashboard.

Metrics

After the evaluation is finished, you can see standard metrics in the DynamoEval Dashboard.

DynamoEval reports the following classification metrics based on the final judgement

F1 Score
Accuracy
Recall
Precision

These metrics are also reported across different data types and any perturbations that were selected to be run.

Deep Dive Analysis

After running the evaluation, users can explore specific examples in the deep dive UI. Each row includes:

The generated prompt
Ground truth prompt label (compliant, noncompliant)
Predicted prompt label (compliant, noncompliant)
The final judgement (true positive, true negative, false positive, false negative)
The data type (In-Domain, Diverse, Jailbreak)
The dataset source (which policy does the data relate to)

Click into any row for a side-by-side comparison of how the system handled the original and any perturbed prompts.

Overview​

How to Run the Test​

Evaluation Methodology​

Metrics​

Deep Dive Analysis​

Overview

How to Run the Test

Evaluation Methodology

Metrics

Deep Dive Analysis